Dropout Training as Adaptive Regularization
نویسندگان
چکیده
Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
منابع مشابه
Summary and discussion of: “Dropout Training as Adaptive Regularization”
Multi-layered (i.e. deep) artificial neural networks have recently undergone a resurgence in popularity due to improved processing capabilities and the increasing availability of large datasets. Popular in the 1980’s and before, they were largely abandoned in favor of convex methods (such as support vector machines) that came with optimality guarantees and often gave better results in far less ...
متن کاملThe dropout learning algorithm
Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analy...
متن کاملCompacting Neural Network Classifiers via Dropout Training
We introduce dropout compaction, a novel method for training feed-forward neural networks which realizes the performance gains of training a large model with dropout regularization, yet extracts a compact neural network for run-time efficiency. In the proposed method, we introduce a sparsity-inducing prior on the per unit dropout retention probability so that the optimizer can effectively prune...
متن کاملDropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs
Matrix factorization (MF) and Autoencoder (AE) are among the most successful approaches of unsupervised learning. While MF based models have been extensively exploited in the graph modeling and link prediction literature, the AE family has not gained much attention. In this paper we investigate both MF and AE’s application to the link prediction problem in sparse graphs. We show the connection ...
متن کاملShakeout: A New Regularized Deep Neural Network Training Scheme
Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. The invention of effective training techniques largely contributes to this success. The so-called "Dropout" training scheme is one of the most powerful tool to reduce over-fitting. From the statistic point of view, Dropout works by implicitly imposing an L2 regularizer on the weights....
متن کامل